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Abstract 

Background: The ciliate Paramecium bursaria harbors several hundred cells of the green-alga Chloreila sp. in their 
cytoplasnn. Irrespective of the nnutual relation between P. bursaria and the symbiotic algae, both cells retain the 
ability to grow without the partner. They can easily reestablish endosymbiosis when put in contact with each other. 
Consequently, P. bursaria is an excellent model for studying cell-cell interaction and the evolution of eukaryotic cells 
through secondary endosymbiosis between different protists. Despite the importance of this organism, no genomic 
resources have been identified for P. bursaria to date. This investigation compared gene expressions through 
RNA-Seq analysis and de novo transcriptome assembly of symbiont-free and symbiont-bearing host cells. 

Results: To expedite the process of gene discovery related to the endosymbiosis, we have undertaken lllumina 
deep sequencing of mRNAs prepared from symbiont-bearing and symbiont-free P. bursaria cells. We assembled the 
reads de novo to build the transcriptome. Sequencing using lllumina HiSeq2000 platform yielded 232.3 million 
paired-end sequence reads. Clean reads filtered from the raw reads were assembled into 68,175 contig sequences. 
Of these, 10,557 representative sequences were retained after removing Chlorella sequences and lowly expressed 
sequences. Nearly 90% of these transcript sequences were annotated by similarity search against protein databases. 
We identified differentially expressed genes in the symbiont-bearing P. bursaria cells relative to the symbiont-free 
cells, including heat shock 70 kDa protein and glutathione S-transferase. 

Conclusions: This is the first reported comprehensive sequence resource of Paramecium - Chlorella endosymbiosis. 
Results provide some keys for the elucidation of secondary endosymbiosis in P. bursaria. We identified P. bursaria 
genes that are differentially expressed in symbiont-bearing and symbiont-free conditions. 
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Background 

As demonstrated by the evolution of mitochondria and 
chloroplasts, endosymbiosis is a major driving force behind 
eukaryotic cell evolution leading to acquisition of new 
intracellular components and cell diversity [1,2]. Although 
endosymbiosis is an important and widespread pheno- 
menon, the mechanisms controlling the establishment of 
endosymbiosis between different eukaryotic cells are not 
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well understood. In fact, P. bursaria cells harbor about 700 
symbiotic algae in their cytoplasm [3]. Each alga is en- 
closed in a perialgal vacuole (PV) membrane derived from 
the host digestive vacuole (DV) membrane, which protects 
the alga from the hosts lysosomal fusion [4-6]. Irrespective 
of the mutual relations between P, bursaria and symbiotic 
algae [7-11], the symbiont-free cells and the symbiotic 
algae retain the ability to grow without a partner. 
Symbiont-free cells can be prepared by various means: 
cultivation under constant dark conditions [12-14], 
treatment with cycloheximide [3,15,16], and treatment 
with the photosynthesis inhibitor dichlorophenyl dime- 
thylurea (DCMU) [17]. However, symbiotic algae can be 
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isolated by homogenization or by sonication or by the 
treatment of symbiotic cells with detergent. They can 
grow outside host cells [18]. Symbiont-free cells are easily 
reinfected with symbiotic algae by mixing the two toge- 
ther. Therefore, P, bursaria has been considered an ex- 
cellent model for studying cell-cell interaction and the 
evolution of eukaryotic cells through secondary endosym- 
biosis between different protists [19]. However, neither 
genomic nor transcriptomic information has been avail- 
able to elucidate the establishment of endosymbiosis in 
P, bursaria to date. To expedite the process of gene 
discovery related to the endosymbiosis, we have under- 
taken Illumina deep sequencing of mRNAs prepared from 
symbiont-bearing and symbiont-free P. bursaria cells in 
this study. Our data provide a comprehensive sequence 
resource for the advancement of P, bursaria study. 

Results and discussion 

Deep-sequencing and assembly 

We constructed three RNA-seq libraries from mRNA of 
P, bursaria harboring symbiotic alga, Chlorella variabilis^ 
and three libraries from symbiont-free P. bursaria. Se- 
quencing using Illumina HiSeq2000 platform yielded 
232.3 million 101 -by- 101 bp paired-end sequence reads. 
After trimming the low-quality parts and removing reads 
of less than 50 bp, 436.9 million reads (42.9 Gb) remained. 
To obtain a comprehensive sequence set of the P, bursaria 
transcriptome, all the clean reads of symbiont-bearing and 
symbiont-free libraries were assembled together using the 
Trinity program [20]. The de novo assembly produced 
68,175 contigs, clustering into 40,805 subcomponents (i.e. 
unigenes). We selected the longest transcript as the repre- 
sentative for each cluster. The unigene sizes were 200 bp 
up to 22,858 bp, with mean length of 904 bp, N50 of 
1,832 bp totaling 36,894,860 bp for all unigenes; 9,620 
(23.6%) of unigenes were longer than 1,000 bp. 

We excluded unigenes derived from the symbiotic 
Chlorella and other contaminants. Of the 68,175 contig 
sequences, 11,256 were matched to the C. variabilis 
sequences, and were therefore removed. Unigenes lowly 
expressed with log-counts-per-million (logCPM) < 0 were 
also discarded because they are likely to be contaminant 
sequences or poor assembly models. Based on the data- 
base search, the small amount of the contaminant se- 
quences appears to be derived from some bacteria such as 
Methylobacterium and Burkholderiales, which are likely to 
be included in the culture media in which we grew P, bur- 
saria. These procedures produced P, bursaria transcript 
reference sequences composed of 10,557 unigenes. 

Annotation of the assembled contigs 

We performed similarity searches of the 10,557 P, bur- 
saria unigenes against the Swiss-Prot and UniRef90 pro- 
tein sequence databases [21] using BLASTX [22] with the 



E-value cutoff of le-5 and assigned the functional annota- 
tions of the most similar protein sequences. Of the 10,557 
unigenes, 7,051 (67%) had matches with 4,102 unique 
records in the Swiss-Prot database; 9,536 (90%) had 
matches with 8,189 unique records in the UniRef90 data- 
base. The species distribution of the BLASTX best hits in 
the UniRef90 database showed that 8,710 (91.7%) of the 

9.502 hits had top matches with sequences from P, tetra- 
urelia, followed by Tetrahymena thermophila with 153 
(1.6%) best BLASTX hits. 

We predicted open reading frames (ORFs) from the 
10,557 P, bursaria unigene sequences using OrfPredictor 
[23]. Of the 10,557 ORFs, 10,535 were longer than 50 
amino acids, 10,134 were longer than 100 amino acids, 
and 3,425 were longer than 500 amino acids. Although 
whole genome sequences have been clarified in P, tetra- 
urelia [24] and T, thermophila [25], endosymbiotic algae 
including Chlorella species have not yet been detected 
in these ciliates. Therefore, we tried to compare their 
ORFs length, GC%, and shared gene clusters among 
these two ciliates and P, bursaria to elucidate the genomic 
features of P, bursaria as a potential host cell for the sym- 
biotic algae. 

We compared ORFs (>100 amino acids) of P, bursaria 
with those of its close relatives P, tetraurelia and T, ther- 
mophila. The maximum values for lengths of ORFs for 
P, bursaria, P, tetraurelia, and T, thermophila were, re- 
spectively, 19,640, 21,570, and 34,740. The median 
values for lengths of ORFs for P, bursaria, P, tetraurelia, 
and T, thermophila were, respectively, 1,161, 1,155, and 

1.503 (Additional file 1). The corresponding values for 
G + C contents (%) of ORFs were 30.1, 29.7, and 27.8 
(Additional file 2). 

We identified 29,835 orthologous gene clusters contain- 
ing 70,114 ORFs from the three organisms: i.e. 10,134, 
37,382, and 22,598 ORFs longer than 100 amino acids 
long, respectively, from P, bursaria, P, tetraurelia, and T, 
thermophila. Figure 1 shows the number of orthologous 
gene clusters shared among the three organisms. Of the 
29,835 orthologous gene clusters, 3,421 were common to 
all three organisms, 2,854 were unique to Paramecium 
species, and 2,330 were unique to P, bursaria. 

Differential gene expression between symbiont-bearing 
and symbiont-free conditions 

We compared gene expressions of symbiont-bearing and 
symbiont-free P, bursaria to elucidate the genetic control 
for establishment of secondary symbiosis. Of the 10,557 
transcripts, 6,698 were significantly differentially expressed 
between symbiont-bearing and symbiont-free cells with 
false discovery rates (FDR) < 0.05 (Additional file 3). 

The positive and negative values of log2 Fold Change 
(logFC) show that the sequences were up-regulated and 
down-regulated in symbiont-bearing cells compared to 
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Paramecium bursaria Paramecium tetraurelia 




Tetrahymena thermophila 

Figure 1 Venn diagram siiowing the number of orthiologous 
gene clusters shiared among thiree organisms: P. bursaria (A), 
P. tetraurelia [B), and T. thermoptiila (C). Protein coding sequences 
from tlie tliree organisms were grouped into ortliologous gene 
clusters using OrtlioMCL [26]. 



symbiont-free cells. The parametric analysis of gene set 
enrichment (PAGE) [27] detected enrichment in the 17 
gene ontology (GO) terms, i.e. 8 biological processes, 3 
cellular components, and 6 molecular functions based 
on the logFC between symbiont-bearing and symbiont- 
free cells with false discovery rates (FDR) < 0.05 (Table 1). 
Figure 2 portrays a parent-child relation between GO 
molecular function terms including the six highest rank- 
ing GO terms based on PAGE values: oxidoreductase 
activity (Z = -6.879), structural constituent of ribosome 
(Z= -4.153), pyridoxal phosphate binding (Z= -4.015), 
phosphorelay response regulator activity (Z = 3.310), 
chromatin binding (Z = 3.107), and phosphorelay sensor 
kinase activity (Z = 2.901). A closer examination of individ- 
ual proteins assigned to these GO terms revealed that 
trans-2-enoyl-CoA (oxidoreductase activity), aminotrans- 
ferases (pyridoxal phosphate binding) and ribosomal pro- 
teins tended to be down-regulated, whereas transcriptional 
activator Myb-related proteins (chromatin binding), and 
signal transduction histidine kinase (phosphorelay response 
regulator activity and phosphorelay sensor kinase activity) 
tended to be up-regulated in symbiont-bearing cells rela- 
tive to symbiont-free cells. 

Based on the knowledge of P. bursaria accumulated to 
date, functions can be inferred for some of the six high- 
est ranking GO terms. Down-regulation of ribosomal 
proteins in symbiont-bearing P, bursaria cells suggests 



that algal proteins with functions equivalent to those of 
the host Paramecium cells are transferred to the host 
through the PV membrane. Consequently transcriptional 
activity of the host was reduced. However, no report to 
date has described a demonstration showing that the 
proteins synthesized by the algae and transferred to the 
host cell through photosynthetic products, mainly maltose, 
are transferred from the algae [7,9,11]. 

Up-regulation of signal transduction histidine kinase 
in symbiont-bearing P, bursaria cells suggests that the 
histidine kinases play an important role in signal percep- 
tion in this secondary symbiosis, as shown in the pri- 
mary symbiosis by bacteria [28] for the adaptation and 
survival of various organisms to harsh environmental con- 
ditions [29]. Sensor histidine kinases are known to play im- 
portant roles in several eukaryotic species including yeasts, 
fungi, slime molds, and higher plants [30-32]. Symbiont- 
bearing P, bursaria cells acquire various stress resistance 
through endosymbiosis with Chlorella spp [33-36]. 

In addition to the genes of enriched GO terms dis- 
cussed above, heat shock 70 kDa protein (Hsp70) and 
glutathione S-transferase (GST) genes were up-regulated 
and down-regulated as shown by the positive and nega- 
tive values of logFC, respectively, in symbiont-bearing 
cells compared to symbiont-free cells (Table 2). Of the 
10,557 unigenes, 8 were annotated as Hsp70 with logFC 
of -1.3 to 5.6, with a median of 0.92. Symbiont-bearing 
P, bursaria cells are known to show a higher survival 
ratio against nickel chloride, high temperature, and hydro- 
gen peroxide than the symbiont-free cells show [35,36]. 
Furthermore, P, caudatum cells reportedly acquire heat- 
shock resistance by infection of endonucler symbiotic 
bacteria Holospora [37-39], and osmotic-shock resistance 
[40]. Hori and Fujishima [39] reported that H, obtusa- 
bearing paramecia expressed high levels of Hsp70 mRNA 
even at 25°C. The up-regulation of the transcripts en- 
coding Hsp70 might be related to the hosts tolerance to 
environmental fluctuations. Of the 10,557 transcripts, 
7 were annotated as GST and tended to be down-regu- 
lated with logFC of -5.7 to -0.12, with a median of -0.85 
(Table 2). This enzyme is related to protection of cells 
from oxidative stress, as shown by McCord and Fridovich 
[41] and by Veal et al. [42]. Although it was conceivable 
that photo-oxidative stress in symbiont-bearing P, bur- 
saria cells is greater than that in symbiont-free ones, our 
data showed opposite results from the prediction. A simi- 
lar result was obtained by Hortnagl and Sommaruga [33] . 
They suggested that the presence of algal symbionts mini- 
mizes photo-oxidative stress [33]. Consequently, different 
expression levels in these genes between symbiont-free 
and symbiont-bearing P. bursaria agree well with differ- 
ences in cytological phenomena observed in these para- 
mecia, suggesting that these proteins appear to be involved 
in the symbiosis. Immunological detections of the gene 
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Table 1 Enrichment of gene ontology terms In differentially expressed sequences in P. bursaria detected by PAGE 



GO name 


GO_id 


Number of sequences 


Z score 


P-value 


False discovery rate 


BP oxidation-reduction process 


GO:0055114 


135 


-6.992 


2.71 E-1 2 


3.28E-10 


MF oxidoreductase activity 


GO:0016491 


92 


-6.879 


6.04E-12 


3.65E-10 


BP metabolic process 


GO:0008152 


104 


-5.659 


1 .52E-08 


6.15E-07 


BP carboliydrate metabolic process 


GO:0005975 


33 


-4.656 


3.23E-06 


9.76E-05 


CC integral to membrane 


GO:0016021 


196 


-4.399 


1 .09E-05 


0.0003 


BP translation 


GO:0006412 


56 


-4.187 


2.83E-05 


0.0006 


MF structural constituent of ribosome 


GO:0003735 


57 


-4.153 


3.29E-05 


0.0006 


CC ribosome 


GO:0005840 


54 


-4.044 


5.25E-05 


0.0008 


MF pyridoxal phosphate binding 


GO:0030170 


13 


-4.015 


5.96E-05 


0.0008 


MF phosphorelay response regulator activity 


GO:0000156 


40 


3.310 


0.0009 


0.0102 


BP phosphorelay signal transduction system 


GO:0000160 


40 


3.310 


0.0009 


0.0102 


BP ATP hydrolysis coupled proton transport 


GO:0015991 


16 


-3.239 


0.0012 


0.0121 


MF chromatin binding 


GO:0003682 


21 


3.107 


0.0019 


0.0176 


BP DNA replication 


GO:0006260 


18 


3.008 


0.0026 


0.0227 


CC intracellular 


GO:0005622 


95 


-2.953 


0.0031 


0.0254 


MF phosphorelay sensor kinase activity 


GO:0000155 


26 


2.901 


0.0037 


0.0281 


BP biosynthetic process 


GO:0009058 


16 


-2.798 


0.0051 


0.0366 



Notes: GO, gene ontology; PAGE, parametric analysis of gene set enrichment; BP, biological process; MF, molecular function; and CC, cellular component. We used 
log2 Fold Change values between symbiont-bearing and symbiont-free cells to calculate Z scores and corresponding P-values for each GO term. 



products and comparisons of the amount of the antigens 
or qualitative PGR between the symbiont-free and the 
symbiont-bearing paramecia are necessary for future expe- 
riments. To ascertain the molecular mechanisms con- 
trolling the secondary symbiosis between Paramecium and 
Chlorella cells, transcriptome analyses and proteome ana- 
lyses of the symbiotic Chlorella alone cultivated in algal 
culture medium and the same Chlorella inside the host 
Paramecium are necessary for future studies. 

Conclusion 

This study is the first whole transcriptome analysis con- 
ducted between symbiont-free and symbiont-bearing P, 
bursaria. Results showed P, bursaria genes that differ- 
entially expressed in symbiont-bearing and symbiont- 
free conditions. We showed that genes for glutathione 
S -transferase, trans-2-enoyl-CoA, aminotransferases, and 
ribosomal proteins are down-regulated, and that genes for 
Hsp70, transcriptional activator Myb-related proteins, and 
signal transduction histidine kinase are up-regulated in 
the symbiont-bearing P, bursaria. Our results enable us to 
understand the molecular mechanism for establishment of 
the secondary symbiosis and for the host evolutionary 
adaptation to global climate change. 

Methods 

Strains and cultures 

Symbiont-free P, bursaria strain Yadlw was produced 
from Chlorella sp. -bearing P, bursaria strain Yadlg cells 



(syngen 3, mating type I) through repeated cloning and 
cultivation under dark conditions. The Yadlg cell strain 
was collected in Yamaguchi, Japan in 2004. Symbiont- 
bearing strain of YadlglN cells was used for symbiont- 
bearing cells. The YadlglN strain was produced by in- 
fection of cloned symbiotic Chlorella variabilis (formerly 
C. vulgaris) strain 1 N cells to the Yadlw cell [15]. Para- 
mecium strains used for this study were provided by 
Symbiosis Laboratory, Yamaguchi University with support 
in part by the National Bio-Resource Project of the Ministry 
of Education, Culture, Sports, Science and Technology, 
Japan. The culture medium used was 1.25% (w/v) fresh let- 
tuce juice in modified Dryls solution (MDS) [43] (KH2PO4 
was used instead of NaH2P04 ♦ 2H2O), inoculated with a 
non-pathogenic strain of Klebsiella pneumoniae one day 
before use [44]. In ordinary cultures, several hundred cells 
were inoculated into 2 ml culture medium. Then 2 ml of 
fresh culture medium was added on each of the next 
12 days. One day after the final feeding, the cultures were 
in the early stationary phase of growth. All cells used in the 
present experiments were at this phase. Cultivation and all 
experiments were performed at 25 ± 1°C. In the case of 
the symbiont-bearing strain of YadlglN, the cells were 
cultivated under constant light conditions: 20-30 (imol 
photon/m^/s. 

Transcriptome sequencing 

Total RNA was isolated from 400,000 cells of symbiont- 
bearing and symbiont-free P, bursaria cells using Trizol 
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Figure 2 Parent-child relation between gene ontology (GO) molecular function terms viewed by AmiGO (http://amigo.geneontology.org/). 

Red lines show the six highest ranl<ing GO molecular function terms based on P-values in Table 1 . 



reagent (Invitrogen Corp.) according to the manufac- 
turers protocol To construct three RNA-seq libraries 
from mRNAs of P, bursaria, the total RNA was isolated 
from three independent cultures of symbiont-free and 
symbiont-bearing P. bursaria. After suspension in Trizol 
reagent, the symbiont-bearing and symbiont-free cells 
were stirred in the presence of 600 \A of 0.5 mm zirco- 
nia/silica beads (BioSpec Products Inc.) to break cell 
walls of the symbiotic algae. The RNA was treated with 
RNase-free DNase and was cleaned up using RNeasy 
Mini (Qiagen Inc.) according to the manufacturers 
protocol. RNA integrity was confirmed (2100 Bioanalyzer; 
Agilent Technologies Inc.) with a minimum RNA inte- 
grated number value of 7.6. The samples for transcrip- 
tome analyses were prepared using Illuminas kit following 
the manufacturers recommendations. First, mRNA was 
purified from 0.5 (ig of total RNA of symbiont-bearing or 



symbiont-free cells using oligo (dT) magnetic beads. Fol- 
lowing purification, the mRNA was fragmented into small 
pieces using divalent cations under elevated temperature. 
The cleaved RNA fragments were used for first strand 
cDNA synthesis using Superscript II Reverse Transcript- 
ase (Invitrogen Corp.) and random primers. Second strand 
cDNA synthesis was conducted next. These cDNA frag- 
ments then went through an end repair process and 
ligation of adapters. These products were purified and 
enriched with PCR to create the final cDNA library. 
Multiplex sequencing of paired-end reads was performed 
on an Illumina Hiseq2000 instrument at NIBB Core 
Research Facilities, followed by raw data processing, base- 
calling, and quality-control by manufacturers standard 
pipeline using RTA, OLB, and CASAVA. The output se- 
quence quality was inspected using the FastQC program 
(http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). 
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Table 2 Transcripts encoding glutathione S-transferase and heat shock 70 kDa protein in P. bursaria 



Trinity transcript name 



Annotation from the SwissProt database 



logFC 



Heat shock 70 kDa protein 

comp43044_c0 

comp36402_c4 

comp36402_c6 

comp36402_cl 

comp37280_cl 

connp43771_c0 

comp41901_c0 

comp41912_c0 

Glutathione S-transferase 

comp37410_c0 

comp32377_c0 

comp36943_c0 

comp37841_c0 

comp36483_c0 

comp35816_cl 

comp36242_c0 



sp|P09446|HSP7A_CAEEL|Heat shock 70 kDa protein A OS = Caenorhabditis elegans 
sp|P09446|HSP7A_CAEEL|Heat shock 70 kDa protein A OS = Caenorhabditis elegans 
sp|P14834|HSP70_LEIMA|Heat shock 70 kDa protein (Fragment) OS = Leishmania major 
sp|Q9S9Nl|HSP7E_ARATH|Heat shock 70 kDa protein 5 OS =Arabidopsis thaliana 
sp|P37899|HSP70_PYRSA|Heat shock 70 kDa protein OS = Pyrenomonas salina 
sp|P09446|HSP7A_CAEEL|Heat shock 70 kDa protein A OS = Caenorhabditis elegans 
sp|Q9S7C0|HSP7O_ARATH|Heat shock 70 kDa protein 14 OS = Arabidopsis thaliana 
sp|F4JMJl|HSP7R_ARATH|Heat shock 70 kDa protein M OS = Arabidopsis thaliana 

sp|P78417|GST01_HUMAN|Glutathione S-transferase omega-1 0S = Homo sapiens 
sp|Q9ZRT5|GS^l_ARATH|Glutathione S-transferase Tl OS = Arabidopsis thaliana 
sp|Q9ZVQ3|GSTZl_ARATH|Glutathione S-transferase Zl OS = Arabidopsis thaliana 
sp|Q9ZRT5|GS^l_ARATH|Glutathione S-transferase Tl OS = Arabidopsis thaliana 
sp|Q9ZRT5|GS^l_ARATH|Glutathione S-transferase Tl OS = Arabidopsis thaliana 
sp|P78417|GST01_HUMAN|Glutathione S-transferase omega-1 OS = Homo sapiens 
sp|P16413|GSTMU_CAVPO|Glutathione S-transferase B 0S = Cavia porcellus 



5.601 

4.183 

1.975 

1.555 

0.287 

-0.594 

-1.076 

-1.337 

-0.119 
-0.288 
-0.748 
-0.851 
-1.557 
-1.564 
-5.749 



De novo assembly 

The reads were cleaned up with cutadapt program by 
trimming low- quality ends (<QV30) and adapter sequen- 
ces, and by discarding reads shorter than 50 bp. De novo 
assembly of the clean reads was conducted using Trinity 
[20] in the paired-end mode with an option '-min_- 
kmer_cov = 2\ 

Differential expression analysis 

Data of two biological replicates were used in this 
analysis for each condition. Using scripts included the 
Trinity package suite [20], cleaned reads were aligned 
to the Trinity-assembled transcripts using Bowtie [45]. 
Then transcript abundance was estimated using RSEM 
[46]. We used the edgeR package [47] of Bioconductor to 
identify genes that are differentially expressed between the 
conditions. To adjust for library sizes and skewed expres- 
sion of transcripts, the estimated abundance values were 
normalized using the Trimmed Mean of M-values (TMM) 
normalization method [48] included in the edgeR package. 
Based on a negative binomial model implemented in 
edgeR, genes that were differentially expressed between 
symbiont-bearing and symbiont-free P, bursaria samples 
were identified. 

Functional annotation of assembled contigs 

Nucleotide sequences of Chlorella variabilis NC64A 
were obtained from the DOE Joint Genome Institute (JGI) 
site (http://genome.jgi.doe.gov/ChlNC64A_l/ChlNC64A_l. 
home.html). The assembled contigs that matched the 



Chlorella sequences indicated by MEGABLAST search 
(E-value cutoff of le-40) were excluded from analyses. We 
performed a BLASTX search (with the ciliate nuclear 
genetic code and E-value cutoff of le-5) of the P, bursaria 
transcripts against the UniProtKB Swiss-Prot and Uniref90 
protein sequence databases [21] and assigned the functional 
annotations of the most similar protein sequences. 

The protein-coding region of RNA sequences was 
predicted using OrfPredictor with the ciliate nuclear 
genetic code [23]. For sequences having BLASTX hits, 
the frames used by BLASTX were used for identifying 
the coding regions of the sequences. For sequences without 
BLASTX hits, the coding regions were predicted based on 
the intrinsic signals of the sequences. For comparative 
analysis, protein-coding sequences of P, tetraurelia were 
obtained from ParameciumDB (http://paramecium.cgm. 
cnrs-gif.fr). Those of T. thermophila were retrieved from 
Tetrahymena Genome Database (http://www.ciliate.org/). 
Proteins from the three organisms {P, bursaria, P. tetraure- 
lia, and r. thermophila) were grouped into ortholog clus- 
ters using OrthoMCL [26] with BLASTP E-value cutoff of 
le-5 and inflation parameter of 1.5. 

Gene ontology (GO) enrichment analysis 

We conducted a domain search of the P. bursaria tran- 
scripts against the Pfam database release 26.0 [49]. Gene 
ontology (GO) terms were assigned to each transcript using 
the pfam2go conversion table (http://www.geneontology. 
org/external2go/pfam2go) [50]. We performed parametric 
analysis of gene set enrichment or PAGE [27] to test over- 
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representation and under-representation of GO terms 
based on the logFC between symbiont-bearing and 
symbiont-free cells. 

Availability of supporting data 

The data sets supporting the results of this article are 
available in the DDBJ Sequence Read Archive (DRA) 
(accession number DRA000907, http://trace.ddbj.nig.ac. 
jp/DRASearch/submission?acc=DRA000907). 
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